In this file, I will plot the model results as forestplots and correlation plots.
First, we load necessary libraries.
library(tidyverse) # A whole bunch of packages necessary for data analysis
Load results from the previous step.
results_scaled_group <- readRDS(file = "../data/processed/results_scaled_group.rds")
Let’s start by gaining some understanding of what type of results we have.
results_scaled_group
# View(results_scaled_group)
This is a data frame with 675 rows and 23 columns. We can verify that with the dim, nrow and ncol functions.
dim(results_scaled_group) # 675 and 23
## [1] 675 23
nrow(results_scaled_group) # 675
## [1] 675
ncol(results_scaled_group) # 23
## [1] 23
This seems right.
What are the names of the columns?
names(results_scaled_group)
## [1] "adj" "variable" "term" "estimate"
## [5] "std.error" "statistic" "p.value" "conf.low"
## [9] "conf.high" "r.squared" "adj.r.squared" "sigma"
## [13] "statistic1" "p.value1" "df" "logLik"
## [17] "AIC" "BIC" "deviance" "df.residual"
## [21] "fdr" "p.value.group" "fdr.group"
The adj variable is the adjustments (or covariates) included in the specific model. We can pull out the variable (or vector) from the data frame using the dollar sign operator. Then we would get all 675 values printed to the screen. Let’s instead print the first five.
results_scaled_group$adj[1:5]
## [1] "None" "None" "None" "None" "None"
And check that the variable/vector is 675 values long.
length(results_scaled_group$adj)
## [1] 675
Correct! How many unique values does this variable hold?
unique(results_scaled_group$adj) # Unique adjustment levels
## [1] "None" "Age and gender"
## [3] "Age, gender and weight change"
Three adjustment levels. And then how many variables do we then have
675 / 3 # The number of variables that we have data for
## [1] 225
Let’s focus on the adjustment level we used in the original paper: “Age, gender and weight change”.
filter(.data = results_scaled_group, adj == "Age, gender and weight change")
%>%)This is the same line:
results_scaled_group %>% filter(adj == "Age, gender and weight change")
Here we used the pipe (%>%), which is from the magrittr package. This operator takes the object on the left and puts it as the first argument in the function on the right.
Next, to reduce the complexity, let’s select the most important columns and toss out the rest.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
# Select the most important columns
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group)
That’s more manageable. Now we have a dataset with 225 rows and 7 columns. But we don’t want to plot all these 225 variables at the same time. We want to focus on those that made up figure 1 in the paper. To filter out those, we first have to do one thing: join with annotation data.
annotation <- readRDS(file = "../data/processed/annotation.rds")
Let’s quickly look at the names of the annotation object.
names(annotation)
## [1] "clinic" "nightingale"
names(annotation$nightingale)
## [1] "name.full" "name.short" "unit"
## [4] "success_%" "type" "type.short"
## [7] "class" "size" "lipid.subclasses"
## [10] "name.pretty" "name.pretty.unit" "name.order"
Look closer at the nightingale part of the annotation object.
annotation$nightingale
It has 225 rows of 12 columns. The name.short column is the same as the variable column in the results table. We can use those to bind together the data frames, and thereby add the annotation.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
# Left join with the Nightingale annotation data frame, and join by the "variable" and "name.short" columns
left_join(annotation$nightingale, by = c("variable" = "name.short"))
Now we have new columns added to our results data frame. Next, we can filter on “Particle concentration”, since we want to visualize these first.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
# Keep rows where "type" is equal to "Particle concentration"
filter(type == "Particle concentration")
Good. Now we only have 14 rows and 33 columns, and we are ready to start plotting.
Note that the preparations we have done here now are not saved. We just print the result to the screen, and that’s it. The file that we originally loaded the table from is completely unaffected.
Now we will build a simple plot gradually.
It’s an iterative process, where we have an initial idea and try to build the best plot possible. We don’t know if the idea worked until we actually see the result in front of us.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type == "Particle concentration") %>%
# Initiate ggplot call, and put "variable" and "estimate" on the x and y axes, respectively
ggplot(data = ., mapping = aes(x = variable, y = estimate)) +
# Add a geom_point layer (a geometric point object)
geom_point()
This is a start. We have our variables and our estimates. But what we really want is to also visualize the uncertainty around those estimates. Which geometrical object should we therefore switch to?
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type == "Particle concentration") %>%
# Before this point: dplyr specifics (data wrangling)
# Below this point: ggplot2 specifics (plotting)
# Note also that I use "+" and not "%>%" after this point
ggplot(mapping = aes(x = variable, y = estimate)) +
# Switch to geom_pointrange
geom_pointrange()
But there is something wrong! What does the error message say?
Ah, so we might need to add the lower and upper CI boundaries. What were thir names again?
names(results_scaled_group)
## [1] "adj" "variable" "term" "estimate"
## [5] "std.error" "statistic" "p.value" "conf.low"
## [9] "conf.high" "r.squared" "adj.r.squared" "sigma"
## [13] "statistic1" "p.value1" "df" "logLik"
## [17] "AIC" "BIC" "deviance" "df.residual"
## [21] "fdr" "p.value.group" "fdr.group"
They are named “conf.low” and “conf.high”! Perfect. Let’s add those to the geom_pointrange aesthetics.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type == "Particle concentration") %>%
# Before this point: dplyr specifics (data wrangling)
# Below this point: ggplot2 specifics
# Note also that I use "+" and not "%>%" after this point
ggplot(mapping = aes(x = variable, y = estimate)) +
# Add a geom_pointrange layer with lower and upper CIs
geom_pointrange(mapping = aes(ymin = conf.low, ymax = conf.high))
But what we really want is to display the variables on the y axis. Flip the axes using “coord_flip”.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type == "Particle concentration") %>%
# Before this point: dplyr specifics (data wrangling)
# Below this point: ggplot2 specifics
# Note also that I use "+" and not "%>%" after this point
ggplot(aes(x = variable, y = estimate)) +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) +
# Flip the coordinates
coord_flip()
Nice. But now it becomes apparent that the variable names are ordered alphabetically. That’s really not helpful at all. Use the fct_reorder function to reorder the variables by the “name.order” column.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type == "Particle concentration") %>%
# Reorder the variables
mutate(variable = fct_reorder(variable, name.order)) %>%
# Before this point: dplyr specifics (data wrangling)
# Below this point: ggplot2 specifics
# Note also that I use "+" and not "%>%" after this point
ggplot(aes(x = variable, y = estimate)) +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) +
coord_flip()
Better - but it feels natural to keep the largest size high up on the plot, and the smaller sizes lower on the plot. So simply add a minus sign to the ordering to switch it around.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type == "Particle concentration") %>%
# Reverse the order (add a minus sign)
mutate(variable = fct_reorder(variable, -name.order)) %>%
# Before this point: dplyr specifics (data wrangling)
# Below this point: ggplot2 specifics
# Note also that I use "+" and not "%>%" after this point
ggplot(aes(x = variable, y = estimate)) +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) +
coord_flip()
Much better. But did you notice the redundancy in the variable names? They all contain an unnecessary -P. Let’s use the names.pretty instead.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type == "Particle concentration") %>%
# Do the same, but use the names.pretty variable instead
mutate(name.pretty = fct_reorder(name.pretty, -name.order)) %>%
# Before this point: dplyr specifics (data wrangling)
# Below this point: ggplot2 specifics
# Note also that I use "+" and not "%>%" after this point
ggplot(aes(x = name.pretty, y = estimate)) +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) +
coord_flip()
Next, we might want to highlight the significance level, here called p.value.group. Let’s see which unique values this variable has.
unique(results_scaled_group$p.value.group)
## [1] = 0.05 < 0.01 < 0.05 < 0.001
## Levels: = 0.05 < 0.05 < 0.01 < 0.001
This is great. That’s exactly what we want to color by. Add that to the aeshetics to the ggplot call.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type == "Particle concentration") %>%
mutate(name.pretty = fct_reorder(name.pretty, -name.order)) %>%
ggplot(aes(x = name.pretty, y = estimate,
# Color by P value group (significance level)
color = p.value.group)) +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) +
coord_flip()
Very cool.
But what if we are unsatisfied by the default colors that ggplot2 supplied? Then we can choose other colors. I usually pick colors from the palettes offered by the RColorBrewer package. To display them, run this line.
RColorBrewer::display.brewer.all(colorblindFriendly = TRUE)
Here we can color many types of continuous (sequential or diverging) or discrete distributions. We have four groups, and will therefore choose a discrete distribution. I like “Dark2”.
So how do we use this knowledge? I usually modify aesthetics using the "scale_*" family of function. For our particular plot, we want to modify the color aesthetic, and will use the function called "scale_color_brewer)
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type == "Particle concentration") %>%
mutate(name.pretty = fct_reorder(name.pretty, -name.order)) %>%
ggplot(aes(x = name.pretty, y = estimate,
color = p.value.group)) +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) +
# Modify the color aesthetic
scale_color_brewer(name = "P value", palette = "Dark2") +
coord_flip()
We could also choose exactly which colors we want. To do this, we need to retrieve the actual hex codes that make up the palettes.
Say for example that we want a palette that reflects a traffic light for ‘increasing significance’ (red - yellow - green). In addition, we could use a neutral color for non-significance. Let’s find those colors.
RColorBrewer::display.brewer.all()
For “Set1”, there seems to be something that resembles a traffic light. Let’s get the hex codes for those colors.
RColorBrewer::brewer.pal(n = 9, name = "Set1")
## [1] "#E41A1C" "#377EB8" "#4DAF4A" "#984EA3" "#FF7F00" "#FFFF33" "#A65628"
## [8] "#F781BF" "#999999"
Right - that’s what the hex codes look like. Note that the grey color at the end looks a bit different than the rest. So to select our colors of choice, we can simply subset them. To subset, we can use square-brackets, and supply it a vector of the numerical indices that we wish to retain. We would like to retain:
So our code chunk will then look like this:
RColorBrewer::brewer.pal(n = 9, name = "Set1")[c(9, 1, 6, 3)]
## [1] "#999999" "#E41A1C" "#FFFF33" "#4DAF4A"
Now we have our colors. But how to map those onto the correct P value level? In two steps:
RColorBrewer::brewer.pal(n = 9, name = "Set1")[c(9, 1, 6, 3)] %>%
set_names(c(levels(results_scaled_group$p.value.group)))
## = 0.05 < 0.05 < 0.01 < 0.001
## "#999999" "#E41A1C" "#FFFF33" "#4DAF4A"
Great - let’s save it to a useful object.
mycols_traffic <- RColorBrewer::brewer.pal(n = 9, name = "Set1")[c(9, 1, 6, 3)] %>%
set_names(c(levels(results_scaled_group$p.value.group)))
mycols_traffic
## = 0.05 < 0.05 < 0.01 < 0.001
## "#999999" "#E41A1C" "#FFFF33" "#4DAF4A"
And then supply that vector to scale_color_manual.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type == "Particle concentration") %>%
mutate(name.pretty = fct_reorder(name.pretty, -name.order)) %>%
ggplot(aes(x = name.pretty, y = estimate,
color = p.value.group)) +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) +
# Switch to scale_color_manual, and set the correct values
scale_color_manual(name = "P value", values = mycols_traffic) +
coord_flip()
In fact, I want to color the inside of the points only, and leave the confidence intervals black. Then I need to do a few more tweaks:
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type == "Particle concentration") %>%
mutate(name.pretty = fct_reorder(name.pretty, -name.order)) %>%
ggplot(aes(x = name.pretty, y = estimate,
# Change "color" to "fill"
fill = p.value.group)) +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high),
# Change to a point shape that has both a border and a filling
shape = 21) +
# Switch to scale_fill_manual
scale_fill_manual(name = "P value", values = mycols_traffic) +
coord_flip()
Now most of the actual data-related modifications are in place. However, there are certain other design-related features that we might want to adjust:
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type == "Particle concentration") %>%
mutate(name.pretty = fct_reorder(name.pretty, -name.order)) %>%
ggplot(aes(x = name.pretty, y = estimate, fill = p.value.group)) +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high), shape = 21) +
scale_fill_manual(name = "P value", values = mycols_traffic) +
coord_flip() +
# Use the classic theme, which is a bit more publication-ready
theme_classic() +
# Rename the y-axis label and remove the x-axis label (it doesn't really provide anything)
labs(y = "Estimate (95 % CI)", x = NULL,
# Add a title
title = "Particle concentration")
To continue the polishing efforts, let’s do a few more things:
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type == "Particle concentration") %>%
mutate(name.pretty = fct_reorder(name.pretty, -name.order)) %>%
ggplot(aes(x = name.pretty, y = estimate, fill = p.value.group)) +
# Add a horizontal reference line
geom_hline(yintercept = 0, color = "grey60", linetype = "dashed") +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high), shape = 21) +
scale_fill_manual(name = "P value", values = mycols_traffic) +
coord_flip() +
theme_classic() +
# Move the legend to the bottom
theme(legend.position = "bottom") +
labs(y = "Estimate (95 % CI)", x = NULL, title = "Particle concentration")
When the plot area is this narrow, the legend stretches on the outside of the plot. One solution to this issue is to put the legend icons on two rows.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type == "Particle concentration") %>%
mutate(name.pretty = fct_reorder(name.pretty, -name.order)) %>%
ggplot(aes(x = name.pretty, y = estimate, fill = p.value.group)) +
geom_hline(yintercept = 0, color = "grey60", linetype = "dashed") +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high), shape = 21) +
scale_fill_manual(name = "P value", values = mycols_traffic) +
coord_flip() +
theme_classic() +
theme(legend.position = "bottom") +
# Put the legend (for the fill aesthetic) on two rows
guides(fill = guide_legend(nrow = 2)) +
labs(y = "Estimate (95 % CI)", x = NULL, title = "Particle concentration")
Finally! I would now say that this plot is complete.
Now that the general plot structure is defined, we can select even more variables, and group these in categories. Actually, we have already prepared all of this in the annotation file. Therefore, (almost) all we have to do is to filter on even more variable types.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
# Filter on more variable types (defined in the annotation file)
filter(type %in% c("Particle concentration", "Apolipoproteins") |
lipid.subclasses == "no" & type %in% c(
"Cholesterol", "Triglycerides", "Phospholipids")
)
Here we have 35 variables. Perfect for a nice forestplot! Let’s try and plot it directly.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
# Filter on more variable types (defined in the annotation file)
filter(type %in% c("Particle concentration", "Apolipoproteins") |
lipid.subclasses == "no" & type %in% c("Cholesterol",
"Triglycerides",
"Phospholipids")) %>%
mutate(name.pretty = fct_reorder(name.pretty, -name.order)) %>%
ggplot(aes(x = name.pretty, y = estimate, fill = p.value.group)) +
geom_hline(yintercept = 0, color = "grey60", linetype = "dashed") +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high), shape = 21) +
scale_fill_manual(name = "P value", values = mycols_traffic) +
coord_flip() +
theme_classic() +
theme(legend.position = "bottom") +
guides(fill = guide_legend(nrow = 2)) +
labs(y = "Estimate (95 % CI)", x = NULL)
Alright. The figure is now higher (note that I doubled the fig.height in the code chunk header!).
But something is wrong. We need to group the variables based on the types. To this, we use facets.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type %in% c("Particle concentration", "Apolipoproteins") |
lipid.subclasses == "no" & type %in% c("Cholesterol",
"Triglycerides",
"Phospholipids")) %>%
mutate(name.pretty = fct_reorder(name.pretty, -name.order)) %>%
ggplot(aes(x = name.pretty, y = estimate, fill = p.value.group)) +
geom_hline(yintercept = 0, color = "grey60", linetype = "dashed") +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high), shape = 21) +
scale_fill_manual(name = "P value", values = mycols_traffic) +
coord_flip() +
# Create a facet grid by variable types in the rows
facet_grid(rows = vars(type)) +
theme_classic() +
theme(legend.position = "bottom") +
guides(fill = guide_legend(nrow = 2)) +
labs(y = "Estimate (95 % CI)", x = NULL)
This is obviously not what we want. We need to free up the scales and plotting space within each facet.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type %in% c("Particle concentration", "Apolipoproteins") |
lipid.subclasses == "no" & type %in% c("Cholesterol",
"Triglycerides",
"Phospholipids")) %>%
mutate(name.pretty = fct_reorder(name.pretty, -name.order)) %>%
ggplot(aes(x = name.pretty, y = estimate, fill = p.value.group)) +
geom_hline(yintercept = 0, color = "grey60", linetype = "dashed") +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high), shape = 21) +
scale_fill_manual(name = "P value", values = mycols_traffic) +
coord_flip() +
facet_grid(rows = vars(type),
# Free up the scales and plotting space within each facet
scales = "free", space = "free") +
theme_classic() +
theme(legend.position = "bottom") +
guides(fill = guide_legend(nrow = 2)) +
labs(y = "Estimate (95 % CI)", x = NULL)
I don’t completely like the ugly black border around the strips. Also, I want to be able to read the strip text without tilting my head (let’s rotate them!). Again we modify non-data ink using themes.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type %in% c("Particle concentration", "Apolipoproteins") |
lipid.subclasses == "no" & type %in% c("Cholesterol",
"Triglycerides",
"Phospholipids")) %>%
mutate(name.pretty = fct_reorder(name.pretty, -name.order)) %>%
ggplot(aes(x = name.pretty, y = estimate, fill = p.value.group)) +
geom_hline(yintercept = 0, color = "grey60", linetype = "dashed") +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high), shape = 21) +
scale_fill_manual(name = "P value", values = mycols_traffic) +
coord_flip() +
facet_grid(rows = vars(type), scales = "free", space = "free") +
theme_classic() +
theme(legend.position = "bottom",
# Remove ugly black border around strips
strip.background.y = element_blank(),
# Rotate the strip text so that it's easier to read
strip.text.y = element_text(angle = 0)) +
guides(fill = guide_legend(nrow = 2)) +
labs(y = "Estimate (95 % CI)", x = NULL)
This, unfortunately, makes the figure a bit too wide. There is a lot of whitespace on the right hand side. There are several ways to deal with this, one of which is to simply shorten the faceting names. I’ve already prepared such a variable, so let’s facet by that one instead.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type %in% c("Particle concentration", "Apolipoproteins") |
lipid.subclasses == "no" & type %in% c("Cholesterol",
"Triglycerides",
"Phospholipids")) %>%
mutate(name.pretty = fct_reorder(name.pretty, -name.order)) %>%
ggplot(aes(x = name.pretty, y = estimate, fill = p.value.group)) +
geom_hline(yintercept = 0, color = "grey60", linetype = "dashed") +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high), shape = 21) +
scale_fill_manual(name = "P value", values = mycols_traffic) +
coord_flip() +
# Change to "type.short", which displays nicer on the facets
facet_grid(rows = vars(type.short), scales = "free", space = "free") +
theme_classic() +
theme(legend.position = "bottom",
strip.background.y = element_blank(),
strip.text.y = element_text(angle = 0)) +
guides(fill = guide_legend(nrow = 2)) +
labs(y = "Estimate (95 % CI)", x = NULL)
That’s better. We can just add the whole facet/group name in an abbreviation list.
In a final tweak, I want to manually switch the order in which a few of the variables are presented. For example, I often want to show ApoB before ApoA1, and I want to move the variables EstC, FreeC, and Remnant-C above the specific particles, within the Cs/Cholesterol facet. We can do all this before we plot the figure, inside the call to mutate.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type %in% c("Particle concentration", "Apolipoproteins") |
lipid.subclasses == "no" & type %in% c("Cholesterol",
"Triglycerides",
"Phospholipids")) %>%
mutate(name.pretty = fct_reorder(name.pretty, -name.order) %>%
# Relevel a few specific variables
fct_relevel(
# Pull ApoB to the start of it's panel
"ApoB",
# Pull *EstC*, *FreeC*, and *Remnant-C* to the start of their panel (after "Serum-C")
"Remnant-C", "FreeC", "EstC", "Serum-C",
after = Inf
)) %>%
ggplot(aes(x = name.pretty, y = estimate, fill = p.value.group)) +
geom_hline(yintercept = 0, color = "grey60", linetype = "dashed") +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high), shape = 21) +
scale_fill_manual(name = "P value", values = mycols_traffic) +
coord_flip() +
facet_grid(rows = vars(type.short), scales = "free", space = "free") +
theme_classic() +
theme(legend.position = "bottom",
strip.background.y = element_blank(),
strip.text.y = element_text(angle = 0)) +
guides(fill = guide_legend(nrow = 2)) +
labs(y = "Estimate (95 % CI)", x = NULL)
And THAT … I consider a nice plot. Here is the final code snippet.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
filter(type %in% c("Particle concentration", "Apolipoproteins") |
lipid.subclasses == "no" & type %in% c("Cholesterol", "Triglycerides", "Phospholipids")) %>%
mutate(name.pretty = fct_reorder(name.pretty, -name.order) %>%
fct_relevel("ApoB", "Remnant-C", "FreeC", "EstC", "Serum-C", after = Inf)) %>%
ggplot(aes(x = name.pretty, y = estimate, fill = p.value.group)) +
geom_hline(yintercept = 0, color = "grey60", linetype = "dashed") +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high), shape = 21) +
scale_fill_manual(name = "P value", values = mycols_traffic) +
coord_flip() +
facet_grid(rows = vars(type.short), scales = "free", space = "free") +
theme_classic() +
theme(legend.position = "bottom",
strip.background.y = element_blank(),
strip.text.y = element_text(angle = 0)) +
guides(fill = guide_legend(nrow = 2)) +
labs(y = "Estimate (95 % CI)", x = NULL)
Let’s save this figure using ggsave. This function remembers the last plot that you produced (here, in the RMarkdown document), and then saves that to a file that you specify.
Use this code to save PNG (raster graphics) files.
ggsave(filename = "../results/figures/forest-lipid-classes.png",
width = 5, height = 8, dpi = 300)
Use this code to save PDF (vector graphics) files.
ggsave(filename = "../results/figures/forest-lipid-classes.pdf",
width = 5, height = 8, device = cairo_pdf)
Note that I set the device to cairo_pdf from the Cairo package. This is useful when we save PDF files, because it can render special characters, like the more-than-or-equal sign (>=, ≥) that I use to signify more-than-or-equal-to 0.05 (for P values) or 0.20 (for FDR values). If I use the regular pdf device, this would look like a equal sign (=), which is obviously incorrect.
results_scaled_group %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate, conf.low, conf.high, p.value, p.value.group, fdr.group) %>%
left_join(annotation$nightingale, by = c("variable" = "name.short")) %>%
# I filter on variable types that I wish to show...
filter(type %in% c("Fatty acids",
"Glucose metabolism",
"AA, other",
"AA, aromatic",
"AA, branched-chain",
"Ketone bodies",
"Miscellaneous"),
unit != "%") %>%
# ...And relevel some of them manually!
mutate(name.pretty = fct_reorder(name.pretty, -name.order) %>%
fct_relevel("DHA", "FAw3", "LA", "FAw6", "PUFA", "MUFA", "SFA", "TotFA", after = Inf)) %>%
# And if I wish to remove the UnSat variable, just filter it away, like so:
filter(name.pretty != "UnSat") %>%
ggplot(aes(x = name.pretty, y = estimate, fill = p.value.group)) +
geom_hline(yintercept = 0, color = "grey60", linetype = "dashed") +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high), shape = 21) +
scale_fill_manual(name = "P value", values = mycols_traffic) +
coord_flip() +
facet_grid(rows = vars(type.short), scales = "free", space = "free") +
theme_classic() +
theme(legend.position = "bottom",
strip.background.y = element_blank(),
strip.text.y = element_text(angle = 0)) +
guides(fill = guide_legend(nrow = 2)) +
labs(y = "Estimate (95 % CI)", x = NULL)
A main point of the NoMa study is that exchanging SFA by PUFA will reduce LDL-C. And indeed, this is what happened is the intervention: the intervention arm reduced the TC and LDL-C by ~10 %, while there was no change in the control arm. Naturally, we would assume that changes in the Nigtingale metabolome associates strongly with changes in LDL-C. One way to test this is to look at the cross-sectional association between LDL-C and Nightingale metabolites at baseline, and whether these associations are “normalized” by the intervention. To test this hypothesis in a global manner, we can then plot a scatteplot between the baseline, cross-sectional \(\beta\) coefficients (x axis), and the intervention-based group effect \(\beta\) coefficients (y axis).
Therefore, we have prepared linear regression results with LDL-C as the exposure (at baseline), with adjustment for age, gender and BMI.
results_scaled_ldlc <- readRDS(file = "../data/processed/results_scaled_ldlc.rds")
Let’s select only the estimates, and join the estimates from the (scaled) group and continuous LDL-C analyses.
estimates_xy <- left_join(
results_scaled_group %>%
ungroup() %>%
filter(adj == "Age, gender and weight change") %>%
select(variable, estimate),
results_scaled_ldlc %>%
select(variable, estimate),
by = "variable",
suffix = c("_group", "_ldlc")
)
Next, run a linear regression for all 225 variables, and pull out those that have standardized residuals less than -1.5 or higher than 1.5. Then plot the scatterplot for all variables, and highlight those with aberrant residuals.
high_resids <- estimates_xy %>%
as.data.frame() %>% column_to_rownames("variable") %>%
lm(estimate_ldlc ~ estimate_group, data = .) %>%
broom::augment() %>%
filter(.std.resid > 1.5 | .std.resid < -1.5) %>%
pull(".rownames")
estimates_xy %>%
ggplot(aes(estimate_group, estimate_ldlc)) +
geom_smooth(method = "lm", color = "black", size = 0.5) +
geom_point(shape = 21, fill = "lightblue", color = "grey50", size = 3) +
ggrepel::geom_text_repel(
data = . %>% filter(variable %in% high_resids),
aes(label = variable), size = 2.5, force = 0.5, box.padding = unit(0.1, "lines")) +
theme_classic() +
labs(x = "Difference between intervention and control group at end-of-study",
y = "Association with LDL-C at baseline",
caption = "Data: all")
Definitely a clear negative association, just as expected. However, it seems that there are a lot of ‘outliers’, and it seems many of them are percentages or ratios. Let’s try and remove those and plot the figure again.
high_resids_sub <- estimates_xy %>%
filter(str_detect(variable, "%|/", negate = TRUE)) %>%
as.data.frame() %>% column_to_rownames("variable") %>%
lm(estimate_ldlc ~ estimate_group, data = .) %>%
broom::augment() %>%
filter(.std.resid > 1.5 | .std.resid < -1.5) %>%
pull(".rownames")
estimates_xy %>%
filter(str_detect(variable, "%|/", negate = TRUE)) %>%
ggplot(aes(estimate_group, estimate_ldlc)) +
geom_smooth(method = "lm", color = "black", size = 0.5) +
geom_point(shape = 21, fill = "lightblue", color = "grey50", size = 3) +
ggrepel::geom_text_repel(
data = . %>% filter(variable %in% high_resids_sub),
aes(label = variable), size = 2.5, force = 0.5, box.padding = unit(0.1, "lines")) +
theme_classic() +
labs(x = "Difference between intervention and control group at end-of-study",
y = "Association with LDL-C at baseline",
caption = "Data: Percentages and ratios removed")
This figure indeed confirms our suspicion: the differences in the intervention group at end-of-study are strongly linked to the associations with LDL-C at baseline. Some key insights are:
This concludes the script. Some conclusions and take-homes:
To improve reproducibility, print out the session info for this script.
devtools::session_info()
## - Session info ----------------------------------------------------------
##
## - Packages --------------------------------------------------------------
## package * version date lib source
## acepack 1.4.1 2016-10-29 [1] CRAN (R 3.6.0)
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
## backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0)
## base64enc 0.1-3 2015-07-28 [1] CRAN (R 3.6.0)
## broom * 0.5.2 2019-04-07 [1] CRAN (R 3.6.0)
## callr 3.3.0 2019-07-04 [1] CRAN (R 3.6.1)
## cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.0)
## checkmate 1.9.4 2019-07-04 [1] CRAN (R 3.6.1)
## class 7.3-15 2019-01-01 [2] CRAN (R 3.6.0)
## cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0)
## cluster 2.0.8 2019-04-05 [2] CRAN (R 3.6.0)
## colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0)
## crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
## data.table 1.12.2 2019-04-07 [1] CRAN (R 3.6.0)
## DBI 1.0.0 2018-05-02 [1] CRAN (R 3.6.0)
## desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
## devtools 2.1.0 2019-07-06 [1] CRAN (R 3.6.1)
## digest 0.6.20 2019-07-04 [1] CRAN (R 3.6.1)
## dplyr * 0.8.2 2019-06-29 [1] CRAN (R 3.6.0)
## e1071 1.7-2 2019-06-05 [1] CRAN (R 3.6.0)
## ellipsis 0.2.0.1 2019-07-02 [1] CRAN (R 3.6.1)
## evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
## fansi 0.4.0 2018-10-05 [1] CRAN (R 3.6.0)
## farver 1.1.0 2018-11-20 [1] CRAN (R 3.6.1)
## forcats * 0.4.0 2019-02-17 [1] CRAN (R 3.6.0)
## foreign 0.8-71 2018-07-20 [2] CRAN (R 3.6.0)
## Formula 1.2-3 2018-05-03 [1] CRAN (R 3.6.0)
## fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
## generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
## ggforce * 0.3.1 2019-08-20 [1] CRAN (R 3.6.1)
## ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.6.1)
## ggrepel 0.8.1 2019-05-07 [1] CRAN (R 3.6.1)
## glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
## gridExtra * 2.3 2017-09-09 [1] CRAN (R 3.6.1)
## gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0)
## haven * 2.1.1 2019-07-04 [1] CRAN (R 3.6.1)
## highr 0.8 2019-03-20 [1] CRAN (R 3.6.0)
## Hmisc 4.2-0 2019-01-26 [1] CRAN (R 3.6.0)
## hms 0.5.0 2019-07-09 [1] CRAN (R 3.6.0)
## htmlTable 1.13.1 2019-01-07 [1] CRAN (R 3.6.0)
## htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.6.0)
## htmlwidgets 1.3 2018-09-30 [1] CRAN (R 3.6.0)
## httr 1.4.0 2018-12-11 [1] CRAN (R 3.6.0)
## inline 0.3.15 2018-05-18 [1] CRAN (R 3.6.1)
## jsonlite 1.6 2018-12-07 [1] CRAN (R 3.6.0)
## knitr * 1.23 2019-05-18 [1] CRAN (R 3.6.0)
## labeling 0.3 2014-08-23 [1] CRAN (R 3.6.0)
## labelled 2.2.1 2019-05-26 [1] CRAN (R 3.6.1)
## lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.0)
## latticeExtra 0.6-28 2016-02-09 [1] CRAN (R 3.6.0)
## lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.0)
## lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.1)
## loo 2.1.0 2019-03-13 [1] CRAN (R 3.6.1)
## lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.0)
## magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
## MASS 7.3-51.4 2019-03-31 [2] CRAN (R 3.6.0)
## Matrix 1.2-17 2019-03-22 [2] CRAN (R 3.6.0)
## matrixStats 0.54.0 2018-07-23 [1] CRAN (R 3.6.1)
## memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
## mitools 2.4 2019-04-26 [1] CRAN (R 3.6.1)
## modelr 0.1.4 2019-02-18 [1] CRAN (R 3.6.0)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0)
## nlme 3.1-139 2019-04-09 [2] CRAN (R 3.6.0)
## nnet 7.3-12 2016-02-02 [2] CRAN (R 3.6.0)
## openxlsx * 4.1.2 2019-10-29 [1] CRAN (R 3.6.1)
## packrat 0.5.0 2018-11-14 [1] CRAN (R 3.6.1)
## pheatmap 1.0.12 2019-01-04 [1] CRAN (R 3.6.0)
## pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.0)
## pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.6.0)
## pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.6.0)
## pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
## plyr 1.8.4 2016-06-08 [1] CRAN (R 3.6.0)
## polyclip 1.10-0 2019-03-14 [1] CRAN (R 3.6.0)
## prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
## processx 3.4.0 2019-07-03 [1] CRAN (R 3.6.1)
## ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
## purrr * 0.3.2 2019-03-15 [1] CRAN (R 3.6.0)
## R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0)
## RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 3.6.0)
## Rcpp 1.0.1 2019-03-17 [1] CRAN (R 3.6.0)
## readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.0)
## readxl * 1.3.1 2019-03-13 [1] CRAN (R 3.6.0)
## remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
## reshape2 1.4.3 2017-12-11 [1] CRAN (R 3.6.0)
## rlang 0.4.0 2019-06-25 [1] CRAN (R 3.6.0)
## rmarkdown * 1.13 2019-05-22 [1] CRAN (R 3.6.0)
## rpart 4.1-15 2019-04-12 [2] CRAN (R 3.6.0)
## rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
## rstan 2.19.2 2019-07-09 [1] CRAN (R 3.6.1)
## rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.0)
## rvest 0.3.4 2019-05-15 [1] CRAN (R 3.6.0)
## scales 1.0.0 2018-08-09 [1] CRAN (R 3.6.0)
## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
## StanHeaders 2.19.0 2019-09-07 [1] CRAN (R 3.6.1)
## stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
## stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
## survey 3.36 2019-04-27 [1] CRAN (R 3.6.1)
## survival 2.44-1.1 2019-04-01 [2] CRAN (R 3.6.0)
## tableone 0.10.0 2019-02-17 [1] CRAN (R 3.6.1)
## testthat 2.1.1 2019-04-23 [1] CRAN (R 3.6.1)
## tibble * 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
## tidyr * 1.0.0 2019-09-11 [1] CRAN (R 3.6.1)
## tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.0)
## tidyverse * 1.2.1 2017-11-14 [1] CRAN (R 3.6.1)
## tinytex 0.14 2019-06-25 [1] CRAN (R 3.6.0)
## tweenr 1.0.1 2018-12-14 [1] CRAN (R 3.6.1)
## usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.1)
## utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.0)
## vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.1)
## withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
## xfun 0.8 2019-06-25 [1] CRAN (R 3.6.0)
## xml2 1.2.0 2018-01-24 [1] CRAN (R 3.6.0)
## yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
## zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0)
## zip 2.0.4 2019-09-01 [1] CRAN (R 3.6.1)
## zoo 1.8-6 2019-05-28 [1] CRAN (R 3.6.1)
##
## [1] C:/Users/jacobjc/Documents/R/win-library/3.6
## [2] C:/Program Files/R/R-3.6.0/library